Amazon VGT2 Las Vegas: Automating the Parsing of Route 53 Resolver Query Logs

Introduction

DNS resolution is a critical component for nearly all applications, whether hosted on-premises or in the cloud. Within your Amazon VPCs, the Route 53 Resolver service provides name resolution. As the hub for VPC DNS queries, Route 53 Resolver offers unique insights into the DNS requests made by your VPC resources. With the introduction of Route 53 Resolver query logging, accessing this information has become straightforward. You can determine exactly which names were queried, identify the requesting resource, and view the corresponding responses. This capability is invaluable for troubleshooting, enhancing security, and gaining a deeper understanding of your network.

Route 53 Resolver manages all DNS queries from resources within your VPCs, capable of resolving both public domains and any private hosted zones linked to your VPCs. Moreover, it can be integrated with on-premises environments via Route 53 Resolver endpoints.

In this blog post, we will explore how to automatically parse Route 53 Resolver query logs. Our example illustrates how to flag any query for a domain that aligns with a predetermined list, as well as pinpoint its origin. For instance, this method could help you determine when an EC2 instance issues a DNS query for a potentially harmful hostname that is recognized for distributing malware.

Solution Overview

Identifying Queries

When enabling Route 53 Resolver logging for a VPC, you choose the destination for the logs:

Amazon S3 bucket
Amazon CloudWatch Logs
Amazon Kinesis Data Firehose Delivery Stream

The first two options are suitable for long-term storage and batch data processing. For managing high volumes of incoming streaming log data in near-real-time, we are utilizing Kinesis Data Firehose as the log destination.

The diagram below outlines the complete architecture for log parsing.

Figure 1: Architecture Overview

An EC2 instance, or any other resource in the VPC, sends DNS queries to Route 53 Resolver.
Route 53 Resolver query logging is set up to forward all DNS query logs to the Kinesis Data Firehose delivery stream.
An AWS Lambda function named stream_processor processes the incoming logs. It employs a Python TLD library to validate if the query pertains to a legitimate top-level domain and extracts the first-level domain from the query.
The stream_processor function checks if the first-level domain from the query is present in an Amazon DynamoDB table of noteworthy domains. This table contains all the domains we wish to track queries for.
Optionally, the stream_processor Lambda function can send notifications to an Amazon Simple Notification Service topic for each match found. Subscribers to the topic can receive notifications via email or SMS.
Before being sent back to the Kinesis Data Firehose delivery stream, each processed query will have a new field titled “isMatchedDomain” added to it. If the first-level domain from the query matches any in the DynamoDB table, this field will be updated to ‘Y’; otherwise, it will be set to ‘N’.
Finally, all queries will be forwarded to your selected S3 bucket for further analysis. For instance, you can leverage Amazon Athena to conduct bulk analysis on the S3 data.

In this context, we are using the same terminology as the Python TLD library. The first-level domain refers to the registered domain (example.com or example.co.uk), while the top-level domain denotes the extension (.com, .co.uk).

The subsequent diagram illustrates an example Route 53 Query log before and after being processed by the stream_processor Lambda function. The processed version includes a new field indicating whether the query matched any interesting domain in the DynamoDB table.

Figure 2: Result of Route 53 Resolver Log Processing

Importing Interesting Domains (Optional)

The architecture described relies on a DynamoDB table containing a list of interesting domains for query matching. You can manually populate this list or use the Lambda-based mechanism included in the solution. The import process is outlined below.

Figure 3: Importing Interesting Domains to DynamoDB Table

An administrator uploads a list of domains in text format into an S3 bucket. The import_interesting_domains Lambda function uses regular expressions to extract only the domains from the list, ignoring any non-domain entries.
The import_interesting_domains Lambda function parses the list, employing the Python TLD library to extract only valid first-level domains, thus preventing the import of non-existent top-level domains.
Only valid first-level domains are added to the DynamoDB table. For instance, foo.bar.example.com would be imported as example.com.

Prerequisites

The solution outlined involves several components—Lambda functions, S3 buckets, a DynamoDB table, an SNS topic, and a Kinesis Data Firehose delivery stream. You must also create appropriate IAM roles and policies to ensure that these components can communicate effectively.

To facilitate the setup process, we utilize the AWS Serverless Application Model (AWS SAM). All components and their interactions are defined in the AWS SAM template, which can be easily deployed to your AWS account.

To use the AWS SAM Command Line Interface (CLI), you need the following tools:

AWS SAM CLI – Install the AWS SAM CLI
Python 3.8 installed

Walkthrough

Deploy the AWS SAM Template

Download the AWS SAM template along with all the code from the AWS Samples GitHub repository.
Navigate to the directory containing the downloaded files and execute the commands below. The README.md outlines all parameter details. Ensure to use globally unique values when creating S3 buckets.

# Build an AWS SAM template  
sam build  
# Deploy AWS SAM template while providing parameters for each component  
sam deploy --guided

NOTE: Depending on your environment size and the volume of Route 53 Resolver Query logs, you may need to adjust the deployment parameters accordingly. Check the GitHub repository for specifics on each parameter.

Wait for the stack to deploy successfully.
In the AWS Management Console, navigate to the AWS CloudFormation section in the region specified in the ‘sam deploy’ command. Review the output of the AWS CloudFormation template created by AWS SAM for details about the resources you’ve set up.

Import Interesting Domains

With all components deployed, you can now fill the DynamoDB table with first-level domains you wish to track. You may import a pre-existing list, use an open-source list (like this one), or manually add domain names directly to the table. Remember to upload a text format file.

To import a list of domains, access the AWS Management Console S3 service and find the S3 bucket created in the previous step, identified by ‘S3InterestingDomainsBucketOutput’.
Upload the text file to the S3 bucket. Only valid domains in the file will be imported; non-domain entries and domains with invalid or non-existing TLDs will be disregarded.
Check the DynamoDB table set up during your AWS SAM deployment to see if it has been populated with domains. You can also manually add any additional domains.

For further reading on this topic, check out this blog post for more insights. For authoritative content, visit this site. Lastly, don’t miss out on this excellent resource for more information.